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_— , Abstract 

We consider the application of multilevel Monte Carlo methods to elliptic PDEs with ran- 
dom coefficients. We focus on models of the random coefficient that lack uniform ellipticity 
and boundedness with respect to the random parameter, and that only have limited spatial 
regularity. We extend the finite element error analysis for this type of equation, carried out 
in [5], to more difficult problems, posed on non-smooth domains and with discontinuities in the 
coefficient. For this wider class of model problem, we prove convergence of the multilevel Monte 
Carlo algorithm for estimating any bounded, linear functional and any continuously Frechet 
differentiable non-linear functional of the solution. We further improve the performance of 
the multilevel estimator by introducing level dependent truncations of the Karhunen-Loeve 
expansion of the random coefficient. Numerical results complete the paper. 
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> 1 Introduction 

\o 

Monte Carlo type methods are widely used in a range of scientific applications. The dimension inde- 
pendent convergence rate of the sampling error makes these methods attractive for high-dimensional 
^-jl problems, which often can not be approximated well by other types of methods. However, even 
though dimension independent, the convergence rate of conventional Monte Carlo methods is very 
slow, and getting to high accuracies is often not computationally feasible. 

To improve on the convergence of conventional Monte Carlo methods, one can make use of 
.£h a variety of variance reduction techniques, such as control variates and antithetic sampling. A 

^ particular variance reduction technique which has gotten a lot of attention recently, is the multilevel 
Monte Carlo (MLMC) method. It was first introduced by Heinrich [26] for the computation of 
high-dimensional, parameter-dependent integrals, and has since been applied in many areas of 
mathematics related to differential equations. In particular, a lot of research has been done in 
stochastic differential equations [TO], [151 EH EH EH] and several types of partial differential equations 
(PDEs) with random coefficients [2 El El QH Q21 [22] . 

In this paper, we are concerned with the application of multilevel Monte Carlo methods to ellip- 
tic PDEs with random coefficients. In particular, we will focus on rough coefficients, which cannot 
be uniformly bounded in the random parameter and only have Holder continuous trajectories. This 
type of problem arises, for example, in stochastic groundwater flow modelling, where log-normal 
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random coefficients are frequently used. A fundamental analysis of the multilevel Monte Carlo 
algorithm applied to this type of model problem was recently done in [B] , and also [S] demonstrates 
numerically the effectiveness of multilevel Monte Carlo methods applied to elliptic PDEs with log- 
normal coefficients. The purpose of this paper is to extend the analysis to cover more situations of 
practical interest, and to expand on some of the issues raised in [6] and [8]. 

The analysis in [6] addresses the convergence of the multilevel Monte Carlo method for simple 
output functionals, e.g. the L 2 or the H 1 norm of the solution. In practical situations, however, 
one is often interested in more complicated functionals, such as the outflow through parts of the 
boundary or the position of particles transported in the flow field. Here, we therefore extend 
the convergence analysis to cover bounded linear, as well as continuously Frechet differentiable 
nonlinear functionals of the solution. 

Another issue, raised both in [8] and [6], is the influence of the rough nature of our model 
problem on the performance of the multilevel Monte Carlo estimator. The oscillatory nature and 
the short characteristic length scale of the random coefficients puts a bound on how coarse the 
coarsest level in the multilevel estimator can be. Asymptotically, as the required accuracy goes 
to 0, this does not have any effect on the cost of the MLMC estimator. For a fixed tolerance, 
however, it restricts the gain that we can expect compared to a standard Monte Carlo estimator. 
In this paper, we propose a solution to this problem by using smoother approximations of the 
random coefficient on the coarse levels of the estimator. This allows us to choose the coarsest 
level independent of the length scale on which the random coefficient varies. See also |19j for a 
similar strategy in the context of the related Brinkman problem. In [19] the decay rate for the FE 
error with respect to the number of KL-modes K was assumed. Here we make no such assumption 
and instead use the decay rates established in |6] for certain log-normal fields and covariance 
functions. 

Finally, we extend the theoretical aspects of [6] to more challenging model problems. This 
includes problems posed on polygonal domains, which are frequently used in connection with finite 
element methods, as well as problems where the random coefficient has a jump discontinuity. It is 
well known that these type of model problems do not always exhibit full global regularity, which 
directly influences the convergence rates of the finite element error. 

The outline of the rest of the paper is as follows. In §2, we present the model problem, together 
with the main results on its regularity and finite element error estimates. This follows closely the 
work in [6] . The proof of the new regularity result for polygonal / polyhedral domains is postponed 
to §5. For the reader's convenience, we also briefly recall the multilevel Monte Carlo algorithm and 
the abstract convergence theorem. In §3, we prove convergence of the MLMC algorithm for a wide 
class of (linear and nonlinear) output functionals, including boundary fluxes and local averages of 
the pressure. In §4, we improve on the performance of the MLMC estimator by using smoother 
approximations of the random coefficient on coarser levels. The gains possible with this approach 
are verified both theoretically and numerically. Finally, in §5, we give a detailed proof of the 
regularity result stated in §2, and extend the results further to certain classes of discontinuous 
coefficients. 

The key task in this paper is to keep track of how the constants in the bounds and estimates 
depend on the coefficient a(w, x) and on the mesh size h. Hence, we will almost always be stating 
constants explicitly. Constants that do not depend on a(uj,x) or h will not be explicitly stated. 
Instead, we will write b < c for two positive quantities b and c, if b/c is uniformly bounded by a 
constant independent of a(u, x) and of h. 
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2 Background 



2.1 Problem setting and basic finite element error analysis 

Given a probability space P) and oj E f2, we consider the following linear elliptic partial 

differential equation (PDE) with random coefficients, posed on a bounded, Lipschitz polygo- 
nal/polyhedral domain D C M^, d = 2,3, and subject to Dirichlet boundary conditions: Find 
!i:UxD->18 such that 

—V-(a(uj,x)Vu(uj,x)) = f(u,x), for x £ D, (2.1) 
■u(w, x) = (j)j(uj, x), for x E Tj . 

The differential operators V- and V are with respect to x E D, and T := U^LjTj denotes the 
boundary of D, partitioned into straight line segments in 2D and into planar polygonal panels in 
3D. We assume that the boundary conditions are compatible, i.e. cftj = (pk, if Ij- (1 Tj. ^ 0. We 
also let (j) E i7 1 (.D) be an extension of the boundary data {<pj}"L 1 to the interior of D whose trace 
coincides with <pj on Tj. 

Let us formally define, for all oj E f2, 

amin(^) '■= mina(w, x) and a max (cj) := maxa(w, x). (2-2) 

We make the following assumptions on the input data: 
Al. 

a min ^ almost surely and 1/ G. m ; n E L p (£l), for all p E (0, oo). 
A2. a E Z7 (n, C*(Z))), for some < t < 1 and for all p E (0, oo). 

A3. / E LP*^,^- 1 ^)) and^- E L p * (0, L r ' +1 / 2 (r i )), j = 1, m, for some p* E (0,oo]. 

Here, the space C l {D) is the space of Holder-continuous functions with exponent t, H S (D) is the 
usual fractional order Sobolev space, and L q (Q, B) denotes the space of £>- valued random fields, for 
which the q th moment (with respect to the measure IP) of the ,8-norm is finite, see e.g [6]. A space 
which will appear frequently in the error analysis is the space L q (£l, Hq (D)), which denotes the 
space of Hq(D) -valued random fields with the norm on Hq(D) being the usual L rl (D)-seminorm 



h^(d)- We will weaken Assumption A2 in {5.2, and assume only piecewise continuity of a (a; 



but chose not to do this here for ease of presentation. For the same reason, we do not choose to 
weaken Assumptions Al and A2 to l/a m in and 1 1 « 1 1 (z?) having finite moments of order p a , for some 
p a E (0,oo), although this is possible. Finally, note that since a max (uj) = \\a\\comy Assumption A2 
implies that a max E L P (S7), for any p E (0,oo). 

To simplify the notation in the following, let < C a j t( j >:j < oo denote a generic constant which 
depends algebraically on L 9 (0)-norms of a max , l/a min , ||a|| c tp), an d \\4>j\\ H t+i/2( r .), 

with q < p* in the case of ||/||_fft-i(_D) an d ||j3'*+ 1 / 2 (r )- Two additional random variables related 
to output functionals will be added to this notation later. 

An example of a random field a(u, x) that satisfies Assumptions Al and A2, for all p E (0, oo), 
is a log-normal random field a = exp(g), where the underlying Gaussian field g has a Holder- 
continuous mean and a Lipschitz continuous covariance function. For example, g could have con- 
stant mean and an exponential covariance function, given by 



E 



(g(uj,x)-E[g(cu,x)})(g(uj,y)-E[g(u>,y)}) = a 2 exp(-||x - y\\/X) (2.3) 



where a 2 and A are real parameters known as the variance and correlation length, and || • || denotes 
a norm on M. d . If || • || = || • we will call it a p-norm exponential covariance. 
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If we denote by H](D) := {v G H l (D) : v — <fi = on T}, then the variational formulation of 



(2.1 ), parametrised by w £ fi, is to find u G H^(D) such that 

bu(u(oo, ■),«) =M«), fora11 ^e^dO )- (2-4) 

The bilinear form 6 W and the linear functional L w (both parametrised by oj) are defined as usual, 
for all u,ve H X (D), by 



b UJ (u,v):=J a(uj,x)Vu(x) ■ Vv(x)dx and L w (d) := (/(w, ■)i")jt-i(D)^j-'(D) i ( 2 - 5 ) 
where H^\D) is the closure of C§°(£>) in the H 1 ~ t (D) -norm. We say that for any w £ !1, 



(w, •) is a weak solution of (2.1) iff u(u}, •) G H^(D) and u(w, •) satisfies (2.4). An application 



of the Lax-Milgram Theorem ensures existence and uniqueness of u(oj, •) G H^(D), for almost all 
w, and in combination with Assumptions Al- A3, this gives the existence of a unique solution 
u G L P (Q,H 1 (D)), for any p < p*. 

In [6j, a regularity analysis of the above model problem was performed under the assumptions 
that the spatial domain D is C 2 . Here, the analysis is extended to polygonal domains, or more 
generally, to piecewise C 2 domains that are rectilinear near the corners. This is very important 
since in standard finite element methods one naturally works with polygonal/polyhedral domains. 

Definition 2.1. Let < \a(D) < 1 be such that for any < s < Xa{D),s 7^ |, the Laplace 
operator A is surjective as an operator from H l+S (D) n Hq(D) to H S ^ 1 (D). In other words, 
let \a(D) be no larger than the order of the strongest singularity of the Laplace operator with 
homogeneous Dirichlet boundary conditions on D. 



Theorem 2.2. Let Assumptions A1-A3 hold for some < t < 1. Then, 
a max (u))\\a(u),-)\\* t(B 

\\U(UJ, £ . 7-y 

u 'mm\ UJ ) 



in 



-)\\h^(d) + \\a(u, -)\\ct(D) H^'( w ' Ollfft+i/a^) 

(2-6) 

/or almost all uj G O anc? /or all < s < t such that s < Aa(-D)- Moreover, u G L p (i7, H l+S (D)) , 
for all p < p*. If t = \a(D) = 1, then u G L P (S7, H 2 (D)) and the above bound holds with s = 1. 

Proof. The proof for individual samples, for almost all w G f2, is a classical result and follows 
Grisvard [23l Section 5.2]. A detailed proof making precise the dependence of the bound on the 



coefficients is given in Section 5.1 The remainder of the theorem follows by Holder's inequality 



from Assumptions A1-A3. □ 



Theorem 2.2 can now be used to prove convergence of finite element approximations of u in 
the standard way. We will only consider lowest order elements in detail. Introduce a triangulation 
Th of D, and let Vh be the space of continuous, piecewise linear functions on D that satisfy the 



boundary conditions in (2.1), i.e. 



Vh,<l> '■= { v h £ C(D) : Vh\T linear, for all T G Th, and ^/i|r\, = f° r an 3 = 1> • • • > m } ■ 

For simplicity we assume that the functions 4>j, j = 1, . . . ,m, are piecewise linear with respect to 
the triangulation Th restricted to Tj. To deal with more general boundary conditions is a standard 
exercise in finite element analysis (see e.g. [U §10.2]). 

The finite element approximation of u in Vh6, denoted by Uh, is now found by solving 

b w (u h (uj, -),v) = L u (v) , for all v G V h fl, 

Using Cea's lemma and standard interpolation results on Vh6, we then have (as in [6]) the following. 
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Theorem 2.3. Let Assumptions A1-A3 hold for some < t < 1. Then, 

\(u-U h )(u),-)\ H ir D \ < f amax ^ \ \\u(u,-)\\ H i+sr D) h S 

for almost all u) £ Q and for all < s < t such that s < Xa(D). Hence, 

\\ u ~ u h\\LP{n,Hl{D)) < C aJ,4> 3 h*, f or al1 P < P* , 

wzi/i C a j^ j < oo a constant that depends on the input data, but is independent ofh. If A1-A3 hold 
with t = Aa(-D) = 1, then \\u - Uh\\ip(n,H^(D)) < Ca,/,^ ^- 

The key novel result here (extending the results in [6]) is that the rate of convergence of the 
finite element error on polygonal/polyhedral domains D is the same as on C 2 domains provided 
the order of the strongest singularity for the Laplacian on D is no stronger than t in A1-A3. No 
additional or stronger singularities are triggered by the random coefficient provided a satisfies A2. 
A sufficient (but not necessary) condition for t = 1 is that D is convex. For t < 1, even certain 
concave domains are allowed. 

Remark 2.4. The results can easily be extended also to Neumann and mixed Dirichlet/Neumann 



boundary conditions. We will comment on this in Section 5.1 and confirm it with some of the 



numerical results in Section 3.5 There is also no fundamental difficulty in extending the analysis 



to higher order finite elements. For an extension to mixed finite elements see [21] , 
2.2 Multilevel Monte Carlo Algorithm 

Before we go on to the main part of this paper, we will briefly recall the multilevel Monte Carlo 
algorithm. We also give a review of the main results on its convergence when applied to elliptic 
PDEs of the form described in the previous section. 

Suppose we are interested in finding the expected value of some functional Q = M(u) of the 



solution u to our model problem (2.1). Since u is not easily accessible, Q is often approximated by 
the quantity Qh '■= M(uh), where Uh is a finite dimensional approximation to u, such as the finite 
element solution on a sufficiently fine spatial grid Th defined above. However, Uh may also include 
further approximations such as an inexact bilinear form b^(-, •) « 6^(-, •), e.g. due to quadrature 
or approximation of the input random field a. We will return to this issue in Section |4j 

To estimate E [Q] , we then compute approximations (or estimators) Qh to E [Qh] , and quantify 
the accuracy of our approximations via the root mean square error (RMSE) 

e(Q h ) := (E[(Q h -HQ))^ [ 2 

The computational cost C e (Qh) of our estimator is then quantified by the number of floating point 
operations that are needed to achieve a RMSE of e{Qh) < £■ This will be referred to as the e-cost. 
The classical Monte Carlo (MC) estimator for E [Qh] is 

1 N 

i=i 

where Qh{oj^) is the ith sample of Qh and N independent samples are computed in total. 



There are two sources of error in the estimator (2.7), the approximation of Q by Qh, which is 
related to the spatial discretisation, and the sampling error due to replacing the expected value 
by a finite sample average. This becomes clear when expanding the mean square error (MSE) 
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and using the fact that for Monte Carlo E[Q^] = E[Q h ] and Y[Qf$] = N~ l V[Q h ], where 
Y[X] := E[(X — E[X]) 2 ] denotes the variance of the random variable X : — > R. We get 

e (QMC )2 = N -i Y [Q h ] + (E[Q h -Q}) 2 . (2.8) 

A sufficient condition to achieve a RMSE of e with this estimator is that both of these terms are 
less than e 2 /2. For the first term, this is achieved by choosing a large enough number of samples, 
N = 0(e~ 2 ). For the second term, we need to choose a fine enough finite element mesh Th, such 
that E[Q h -Q} = 0(e). 

The main idea of the MLMC estimator is very simple. We sample not just from one approxi- 
mation Qh of Q, but from several. Linearity of the expectation operator implies that 

L 

E[Q h ] = E[Q ho ] + E lQht ~ Qh^} (2.9) 
1=1 

where {^}^=o,...,l are the mesh widths of a sequence of increasingly fine triangulations Th e with 
Th '■= Th L , the finest mesh, and k\ < hi-\/h(. < &2, for all I = 1, . . . , L and some 1 < k\ < < oo. 
Hence, the expectation on the finest mesh is equal to the expectation on the coarsest mesh, plus a 
sum of corrections adding the difference in expectation between simulations on consecutive meshes. 
The multilevel idea is now to independently estimate each of these terms such that the overall 
variance is minimised for a fixed computational cost. 

Setting for convenience Yq := Qh and Y( := Qh t — Qh t _ X i for 1 < £ < L, we define the MLMC 
estimator simply as 

L L N e 

i=o e=o i=i 

where importantly Y^{oj^) = Qh t {uj^) — Q/i^_ 1 (w^), i.e. using the same sample on both meshes. 

Since all the expectations E\Y(\ are estimated in dep endently in (2.9 ), the variance of the MLMC 
estimator is Yle=o Y[Ye] and expanding as in (2.8) leads again to 

<QZ{N e} ) 2 ■= ^[(Qhmy-nQ]) 2 ] = E^" lv M + (nQh-Q}) 2 . (2.ii) 

e=o 

As in the classical MC case before, we see that the MSE consists of two terms, the variance of the 
estimator and the error in mean between Q and Qh- Note that the second term is identical to the 
second term for the classical MC method in (2.8). 

Let now Ci denote the cost to obtain one sample of Qh e - Then we have the following results on 
the e-cost of the MLMC estimator (cf. [51 US]). 

Theorem 2.5. Suppose that there are positive constants a, /?, 7, c M1 , c M2 , c M3 > such that a> 
I min(/3, 7) and 

Ml. \E[Q h -Q}\ <c M1 h a 



M2. V[Q hl - Qh^} < c M2 h p t , 
M3. C £ < c M3 hj"<, 

Then, for any e < e^ 1 , there exist an L and a sequence {Nj>}f =0 , such that e (Q/f/jv £ i) < e anc 

{e" 2 , if > 7, 

£ - 2 (loge) 2 , z//3 = 7 , 
e -2-( 7 -«/ a> if j3 <7 , 

where the hidden constant depends on c M i , c M 2 and c M 3 • 
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In [6] , it was shown that for our model problem and for the functional Q 



u 



m(D) 



it follows 



immediately from the finite element error result in Theorem 2.3 that Assumptions M1-M2 in 
Theorem 2.5 hold with a < t and f3 < 2t and for 1 < q < p*/2 provided Assumptions A1-A3 hold 
with < t < 1 and p* G (0, oo). For t = 1, we can even choose a = 1 and (3 = 2. Using a duality 
argument we also showed that under the same hypotheses we could expect twice these rates for 



the functional Q :- 



u 



p(Dy 



The aim is now to extend this theory to cover also other functionals 



of the solution u (see q3ft as well as level-dependent estimators (see Q 



3 Output functionals 

In practical applications, one is often interested in the expected values of certain functionals of the 
solution. In the context of groundwater flow modelling, this could for example be the value of the 
pressure or the Darcy flux at or around a given point in the computational domain, or the outflow 
over parts of the boundary. It could also be something more complicated, such as positions and 
travel times of particles released somewhere in the computational domain (see e.g. |20|). 

A standard technique to prove convergence for finite element approximations of output func- 
tionals is to use a duality argument, similar to the classic Aubin-Nitsche trick used to prove optimal 
convergence rates for the L 2 (D)-norm. The specific boundary conditions and forcing terms of the 
dual problem will depend on the output functional considered. A further advantage of using a 
duality argument to prove convergence of output functionals, is that the analysis can be used 
as a starting point for further developments, such as adaptively refined meshes and adjoint error 
correction (|321 118j). These are areas we aim to explore further in the future. 

We will in the following consider both linear and non-linear functionals. We denote the func- 
tional of interest by M w (w), for v G H l (D). Like the bilinear form b^^, •), the functional M w (-) is 
again parametrised by u, and the analysis is done almost surely in u. When the functional does 
not depend on w, we will simply write M(-) instead of M u (-). Our analysis follows mainly |18j . 



3.1 Linear functionals 

Since it is simpler, we will first look at linear functionals. Let us assume for the moment that 
M w : H l {D) -)• R is linear and bounded on H^(D), i.e. M w {v) < \\v\\ H ir D \, for all v G H^(D). 
Now, let us associate with our primal problem (2.4) the following dual problem: find z(lo, •) G Hq(D) 
such that 

b u (v, z(u, •)) = M u (v) , for all v G Hq(D), (3.1) 

and denote by Zh{w, •) G Vh,o the finite element approximation to \2>.\\ . We can again apply the 
Lax-Milgram Theorem to ensure existence and uniqueness of a weak solution z(u, ■) G Hq(D), for 
almost all u. Moreover, since 6 ;x) (*, •) is symmetric, we will also be able to apply Theorems [2.2| and 



2.3 However, first we make the following observation. 



Lemma 3.1. Let M w : Hq{D) — > M. be linear and bounded. Then, for almost all oj G Q, 

\M U (u(u, •)) - M w (u h (u, -))| < a max (w) \u(u, •) - u h (u, -)\m(D) ■) - z h (u, -)\w(d) ■ ( 3 - 2 ) 

Proof. Dropping for brevity the dependence of the FE functions on oj and using the linearity of 
M(j, the dual problem (|3.1|), as well as Galerkin orthogonality for the primal problem, we have 



\M u (u) - M u (u h )\ = \bu(u - u h ,z)\ = \buiu -u h ,z- z h )\ < a max (a;) \u - u h \m{D) \z ~ zh\h 1 (D) > 
where in the last step we have used the definition of a max (w) and the Cauchy-Schwarz inequality. □ 
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This simple argument will be crucial to obtain optimal convergence rates for functionals. Pro- 
vided the functional is bounded not just in H 1 (D), but also in H 1 ^ 1 * (D), for some i* > t, where t 
is as in Assumptions A1-A3, the finite element solution Zh of the dual problem will converge with 
any order s < t (like the primal solution u/ t ). This will allow us (as for the L 2 (D)-norm) to obtain 



a convergence rate twice that of the H 1 (D)-novm in Theorem 2.3 However, the assumption that 



M w is linear is not necessary, and so we will first generalise the above to nonlinear functionals. 
3.2 Nonlinear functionals 



For nonlinear functionals, the dual problem in [32j is not defined as in (3.1) above. Instead, a 
different functional is chosen on the right hand side (which reduces to M u in the linear case). 
It is related to the derivative of the functional of interest and so we need to assume a certain 
differentiability of M u . We will assume here that M u is continuously Frechet differentiable. In 
particular, this implies that M w is also Gateaux differentiable, with the two derivatives being the 



same. We will see in Remark 3.3 below that it is in fact not necessary that M u is continuously 
Frechet differentiable everywhere, but it simplifies the presentation greatly. 

Let v,v G H 1 (D). Then the Gateaux derivative of M w at v and in the direction v is defined as 

DyMuiv) := hm . 

We define 



DvM^u, u h ) := f D v M w (u + 6(u h - u)) d6, 
Jo 

which is in some sense an average derivative of M w on the path from u to Uh, and define the dual 
problem now as: find z(oj, •) G Hq(D) such that 

b u (v,z(u r )) = DX,(u,u h ), for all v G H%(D). (3.3) 

Note that, for any linear functional M w , we have D v M u) (u,Uh) = M^^v), for all v G Hq(D), and 



so (3.3) is equivalent to (3.1) 



For our further analysis, we need to make the following assumption on M w . 



Fl. Let u (resp. Uh) be the exact (resp. the FE) solution of (2.1). Let be continuously 
Frechet differentiable, and suppose that there exists i* G [0, 1], g* G (0, oo] and Cf G L q *(Q), 
such that 

\D v Mw(u,iih)\ < Cp(o;)||u||^i-t«(£)) , for all v G Hq(D) and for almost all 

To get well-posedness of the dual problem, as well as existence and uniqueness of the dual solution 
z(uj, ■) G Hq(D), for almost all w £ O, it would have been sufficient (as in the linear case courtesy 
of the Lax-Milgram Theorem) to assume that \D v M ul {u,Uh)\ is bounded in H 1 (D). However, in 
order to apply Theorem |2.3| and to prove convergence of the finite element approximation of the 
dual solution, it is necessary to require stronger spatial regularity for z. This is only possible if we 
assume boundedness of \D v M u (u,Uf l )\ in H 1 ^ t *(D) for some > 0. In particular, if Assumptions 
A1-A3 and Fl are satisfied with t G (0, 1] and t* > t, then for almost all wGO, 

||*(a;,-)||fli+.(£>) < 7-74 Cf(w), 

for any < s < t such that s < \a(D) and for almost all well. Hence, 

\\ z ~ z h\\ L p(n,Hl(D)) < C a ,c F h s , for all p < q* , (3.4) 
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for some constant C a; c F < oo depending on a and the constant Cp in Fl. 

Moreover, from the Fundamental Theorem of Calculus for Frechet derivatives, it follows that 



Muiu) - Mu(u h ) = / D u ^ Uh Mu{u + 0{u h -u))d6 = D u _ Uh M w {u,u h ) = b u (u-u h ,z) (3.5) 
J o 

and so we have again the following error bound. 
Lemma 3.2. Let Assumption Fl be satisfied, then 

\M U (tt(o/, •)) - M w {u h (u, -))| < a max (cj) \u(u, •) - u h (u, -)\m(D) ■) - z h (oj, -)l^i(D) , (3-6) 
for almost all oo 6 O. 



Similar to the bound in (3.6), one can also find a bound of the error between two finite element 
approximations in Vh and in Vh C Vh, namely 

\M W (u h (to, •)) - M w (uh(lj, -))| < a m ax(w) -)-u H (u, -)ljyi(D) kft(^, -)-^(^, OIh^-D) ( 3 - 7 ) 



In the next section, we will use (3.6) and (3.7) to find optimal rates for (non)linear functionals in 
Assumptions Ml and M2 of the MLMC convergence theorem. 

Remark 3.3. As already mentioned above, continuous Frechet differentiability is not a necessary 
condition. It is possible to weaken Assumption Fl and to assume only slant differentiability of 
Af w (-). The concept of slant differentiability was introduced in [7], where it was also shown that 
an operator F : X — > Y, for two Banach spaces X and Y, is slant differentiable iff it is Lipschitz 
continuous. Most importantly, however, slant differentiability is sufficient for proving (3.5), and 
thus Lemma 13.21 



3.3 Multilevel Monte Carlo convergence for functionals 

We are now ready to prove optimal convergence rates for the MLMC algorithm for Frechet differ- 



entiable (and thus also for linear) functionals as defined above. In order to apply Theorem 2.5 
need bounds on the following two quantities: 

(i) |E[M u (u)-M w (u fc )]| 

(ii) Y [M w {u h£ ) - Muiuh^)] 



wc 



Using the finite element error analysis in Theorem 2.3 together with the bounds in Lemma 3.2 



and in Equation ( |3.7[ ), we are able to derive the following bounds for the convergence rates with 
respect to h for (i) and (ii). 

Proposition 3.4. Let Assumptions A1-A3 hold for some < t < 1 and > 2, and l et M w (-) 
satisfy Assumption Fl with t* >t and > p P 2 2 ■ Then Assumptions M1-M2 in Theorem 
for any a < 2t and f3 < At. For t = 1, we can choose a = 2 and (3 = 4. 



2.5 hold 



Proof. Using Lemma |3.2| and Holder's inequality, we have 

|E [M u (u) - M u (uh)]\ < \\(haax\\LPi(q)\\u- Uh\\LP2(n,H^{D))\\ z - z h\\LP3((l,H£(D)) ( 3 - 8 ) 

where ELi^" 1 = l A11 

norms on the right hand side are finite, if we choose p\ < oo, p 2 < p. 
and ps < q*, which is possible if p~ x + q^ 1 < 1, in particular if > 2 and q* > 
t < 1, it then follows from (3.4) and Theorem 2.3 that 



2p» 



-2 ' 



*? 

In the case 



|e[m w («)-m w (« a )]| < c^f^cr h 



for any a < 2t. 



9 



Similarly, using (3.7) and Holder's inequality, we have 



V[Af w (uO-M* («/»<_!)] < E [\Mu(u hl ) - MwK^Jp 



< of 



lmax|| i 2p 1 (Q)||n^ — Uf le _ 1 \\ L 2p 2 ( n j I i( D ^\\zh e — IIl2 P3 ^ iH 1( D )) (3.9) 



where ^f—ipr 1 = 1. Again, the norms on the right hand side of (3.9) are finite, if we choose 
pi < oo, P2 < p*/2, and p^ < q*/2, which is possible due to our as sum ptions that > 2 and 

that 



a > 

1* ^ p*-2 



In the case t < 1, it follows again from (3.4) and Theorem 



2.3 



V [M w (u he ) - M^uh^)] < C a ,f,4 s ,c F h p , for any /3 < 4t. 
The slightly faster rates of a = 2 and /3 = 4, for i = 1, can be proved analogously. 



□ 



Remark 3.5. In practice, it is in general necessary to use quadrature to compute the integrals 
in the bilinear form b w (v,w), thus leading to approximate, mesh-dependent bilinear forms. As a 
consequence we will compute only an approximate finite element solution Uh £ Vh and Galerkin 
orthogonality for the primal problem is lost. In general, it is then only possible to prove 

\M W (u(u, •)) - M u (u h (uj, -))| < C P (u) \\u{uj, ■) - u h (uj, -)lki(D), 



instead of (3.6), where Cf is the constant from Assumption Fl. It is in fact sufficient that Fl 



holds with t* = in this case. Consequently, it is only possible to verify Assumptions M1-M2 in 



Theorem 2.5 for a < t and j3 < 2t in the case t < 1. Similarly, we can only prove M1-M2 with 
a = 1 and (3 



2, if t = 1. The higher rates of convergence from Proposition 3.4 can be recovered, 
also in the presence of quadrature error, if the coefficient function has additional regularity, i.e. 
if a(cj, •) G C r (D), with r > 2t. For an example of a log-normal random field which has this 



additional regularity, see £4.1 



One can also generalise the results in this section to the case where the dual solution has less 
spatial regularity than the primal solution. For example, if Fl holds only for some i* G [0,t), 



Assumptions M1-M2 in Theorem 2.5 can still be verified, for any a < t + t* and (3 < 2{t + t*). 



3.4 Examples of output functionals 

Before we go on to show some numerical results, we give some examples of output functionals 



which fit into the framework of £3.1 3.3 We start with linear functionals. 



(a) Point evaluations of pressure: Since a(u, •) G C*(Z?) C C(D), we know that trajectories 
of the solution u are in ^(D) (see e.g. p3|), and it is meaningful to consider point values. 
Consider M^\u) := u(x*), for some x* £ D. For Del, i.e. in one space dimension, we 
have the compact embedding H l l 2+& '(D) <—> C S (D), for any 5 > 0, and so 



MW(v) = v(x*) < |M 



sup ^5 



IF/ 2 +i(fl), 



for all v € H l (D). 



Hence, Assumption Fl is satisfied for any t* < min(J,t) with Cf = 1 and q* = oo. 

In space dimensions higher than one, point evaluation of the pressure u is not a bounded 
functional on Hq(D). One often regularises this type of functional by approximating the 
point value by a local average, 



M^(i 



\D*\ 



v(oj, x) dx 



D* 



v(u, X*) 
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where D* is a small subdomain of D that contains x* |18j . Here, M^ 2 ) satisfies Fl with 
Cf = 1, i* = 1 and = oo, due to the Cauchy-Schwarz inequality. 

Similarly, point evaluations of the flux —aVu can be approximated by a local average. How- 
ever, in this case Fl only holds for t* = with Cf = a max and (/* = oo, and the convergence 
rate thus is the same as for the i/ 1 -seminorm. 

Next we give some examples of non-linear functionals. The first obvious example is to estimate 
higher order moments of linear functionals. 

(b) Second moment of average local pressure: Let M w be an arbitrary linear functional 
and let q > 1. Then 



D v (M u (v)*) = lim 



MUv + ev)i - M w {vf 



= lim (MUv) + eMUv)y-M U v)« = qM ^ M ^ 

£-S>0 £ 

Thus, in case of the second moment of the average local pressure M^ 3 ' \v) := (^M^ 2 \v) 
this gives 



D v M<?\v) 



\D*\ 2 



v(x) dx 



D* 



v(x) dx 



D* 



and so 



\D v M {3 \u,u h )\ 



\D*\ 2 
1 

\D*\ 2 



D 



v(x) dx 

ft 

v(x) dx 



D 



(u + 0(uh — u))(x) dxd9 

JD* 

u(uj, x) + Uh(uj, x)) dx 



D 



< 



-)IIl2(D) + -)II-L 2 (D)) IMIl2(D) • 

v ' 

=:C F (u;) 



Now, it follows from the Lax-Milgram Theorem that Cf(w) < ||/( w > ')\\H- 1 (D)/ a roin( i ^), an d 
so Assumption Fl is satisfied for all t* < 1 and < . 

(c) Outflow through boundary: Consider M^(u) := L u (ip) — bu,(ip,v), for some given 
function ^ 6 H 1 (D). Note that for the solution it of (|2.4[), by Green's formula, we have 



D 



M^\u)= I ip(x)f(x,uj)dx — j a(u), x)Vip(x) ■ Vu(u, x) dx 

ip(x) V • (a(oj, x)Vu(uj, x)) dx — \ a(oj, x)Vip(x) ■ Vu(w, x) dx 
ijj{x)a{uj, x)Vu{bj, x) ■ v ds . 



D 



D 



(3.10) 



Thus, M^\u) is equal to the outflow through the boundary F weighted by the function ip, 
and so can be used to approximate the flux through a part r out C T of the boundary, 
by setting V|r out ~ 1 and Vlr\r out ~ 0, see e.g. [HOdl]. 
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Note that for / ^ this functional is only afhne, not linear. When / = 0, then it is linear. 
In any case, 

D M^iv) ■= lim M " ] ^ + £V ) ~ M ^\v) = lim ~ Id q K s) W0*0 ■ V(ev(u, x)) dx 

a(oj, x)Vip(x) ■ Vv(x) dx = / v(x)V ■ (a(u),x)V^(x)) dx , 
d Jd 

for v,v £ Hq(D). Since this is independent of v, we have in particular 



(u,Uh)= / v(x) V • (a(u),x)Vip(x)) dx. 



D 



If we now assume that Assumptions A1-A3 are satisfied for some < t < 1 and that ip 6 
H 1+t (D), then using Theorems 9.1.12 and 6.2.25 in [25] (see also Lemmas A.l and A. 2 in 
[BJ), we have V?/> 6 H l (D) and for any t* < t, 



\D v M^\u,u h )\ < ||V ■ (a(w, OV-i/O ||/rt*-i(D)||u||ifi-t*(D) 



< || (a(w,-)V^) ||jjt*(£>)IMIiri-«*(D) 
< 



l a ( w >-)llct(D)ll V ^llift*(D)lkllifl-**(D)- (3- 11 ) 



IC*(D)" 



Hence, Assumption Fl is satisfied, for any < oo and t* < i, with Cf(lj) = \\a(oj, •) 
If t = 1, then estimate (3.11) holds with t* = t = 1, and Assumption Fl is satisfied with 
= 1. Our assumption on ip is satisfied for example if tp is linear, which is a suitable choice 
for the numerical test in the next section. 

Note that the functional j^- f T a(oj,x)Vu(oj,x) ■ vds (or rather its regularised equivalent 
over a narrow region near r out ) can only be bounded in H 1 ^) and thus it will converge with a 

(5) 

slower rate than M\ . 



3.5 Numerics 



We consider two different model problems in 2D, both in the unit square D 
with / = 1 and = 0, i.e. 



(0, l) 2 : either pi) 



-V • (a(uj, s)Vu(w, x)) = 1, for x 6 D, and u(oj,x) = for x G 9D, 



or the mixed boundary value problem 



la:i=Q 



1, 



—V • (a(uj, x)Vu(uj, x)) 
du 



I Xl=l 



0. 



Di 



X2=0 



0, 
0, 



for x € D, 



(3.12) 



(3.13) 



du 
dv 



0. 



X2 = l 



We take a(w, x) to be a log-normal random field with exponential covariance function (using the 
2-norm in (2.3)) and the underlying Gaussian field has mean zero. We choose A = 0.3 and a 1 = 1. 
The finite element solutions are computed on a family of uniform triangular grids Th with mesh 
widths h = 1/2, 1/4, . . . , 1/128. The sampling from a(u,x) is done using a circulant embedding 
technique (for details see [1 H [20] ) . To assemble the stiffness matrix we have to use a quadrature 
rule. We chose the trapezoidal rule, evaluating the coefficient function at the vertices of the grids. 

First, we consider the approxi mati on of the pressure at the centre of the domain for model 
problem (3.12). As described in ^3.4 for functional M^ 2 \ we approximate it by the average of 
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111 9^1111 ?1 

10 10 10 10 10 10 10 10 



1/h 1/h 

Figure 1: Left plot: |E [M( 2 )(u h ») - M^\u h )] |, for 2D model problem ( |3.12[ ) with A = 0.3, a 2 = 1 
and h* = 1/256. Right plot: Corresponding variance V [M^\u h ) - M^(u 2h )]. The gradient of 
the dotted (resp. dashed) line is —1 (resp. —2). 




Figure 2: 

r 2 



Left plot: 



for 2D model problem ( 3.12[ ) with A 



E[MW(u h *) 2 -MW(u h ) 
a" = 1 and h* = 1/256. Right plot: Corresponding variance 
gradient of the dotted (resp. dashed) line is —1 (resp. —2). 



M( 2 )( 



U2h) 



0.3, 
The 



Uh over the region D* , which is chosen to consist of the six elements (of a uniform grid with 
h* = 1/256) adjacent to the node at (1/2, 1/2). To estimate the errors we approximated the exact 
solution u by a reference solution Uh* on a grid with mesh width h* = 1/256. In Figure [TJ we see 
that |E [M^ 2 \u h *) - M^ 2 \u h )\ | converges linearly in h and V [M^ 2 \u h ) - M( 2 )(u 2 h)] converges 
quadratically, as predicted by Lemma 3.4 for the "exact" FE solution. However, in the context 
of numerical quadrature this is better than expected (cf. Remark 3.5). This suggests that the 
quadrature error is not dominant here. In Figure [2j we estimate the second moment of the same 



functional, and see that also in this case we observe the convergence rates predicted by Lemma 3.4 

we consider an approximation of t 
1} computed via the functional 



For the second model problem (3.13), 



through the boundary r out 
function we choose the linear function ip(x) - 

equal to at all other Dirichlet nodes. Thus, 



m 



le average outflow 
3.4[ As the weight 
xi, which is equal to 1 at all nodes on r out and 
Mw (u) is exactly equal to the flow through r out . 
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Figure 3: Left: Plot of |E[Af£ 4) (ufc«) - M^'(u h )]\, for 2D model gig) problem with A = 0.3 
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1, tp = xi and /i* = 1/256. Right: Corresponding variance V[Mw (ua) — (u2h)] ■ The 



gradient of the dotted (resp. dashed) line is —1 (resp. —2). 



As predicted we see again linear convergence in h for \E[Mi 4 \u h *) - M L 
convergence for Y[M^ 4 \uh) — M^\u2h)] in Figure [sj 

4 Level dependent estimators 



Uh) 



and quadratic 



The key ingredient in the multilevel Monte Carlo algorithm is the telescoping sum (2.9) 



R[Q h ] = E[Q ho ] + Y^E[Q he - Q he 



Looking at this equation more carefully, we see that we are free to choose how to approximate Q 
on the different levels, without violating the above identity, as long as the approximation of Qh e 
is the same in the two terms in which it appears on the right hand side, for £ = 0, ...,L — 1. In 
particular, this implies that we do not have to approximate Q on level I — 1 in the same way as we 
approximate it on level £. We can, for example, approximate the coefficient a{u, x) differently on 
each level, without introducing any additional bias in the final result EfQ^]. 

This is particularly useful in groundwater flow modelling, where the random fields a(u, x) 
are highly oscillatory and vary on a fine scale. The coarsest grids of the (plain-vanilla) MLMC 
estimator will not be able to resolve the coefficient well. As a consequence of this, one needs 
to choose the coarsest grid size ho smaller than a certain threshold to get the MLMC estimator 
with the smallest absolute cost. This limits the number of levels and the amount of benefit that 
the MLMC estimator potentially offers. Numerical investigations in [8], for example, show that 
for log-normal random fields a(u;,x) with exponential, 1-norm covariance function and correlation 
length A, the optimal choice is Hq ~ A. A possible solution to this problem, which will allow us to 
choose ho independent of A and thus achieve higher gains, is to use smoother approximations of 
the coefficient on the coarser levels. We will present one way of doing this in §4.1[ where we use 
level-dependent truncations of the Karhunen-Loeve expansion of a(u,x). 

Before we go on to analyse the level-dependent estimators for log-normal coefficient fields, we 
would like to point out that even though this strategy does not introduce any additional bias in 



the final result EfQ^], it may influence the values of the convergence rates a and (3 in Theorem 2.5 



14 



One has to be careful not to introduce any additional model/approximation errors that decay at a 
slower rate than the discretisation error. 



4.1 Truncated KL-expansions 



As an exemplary case, let us now consider log-normal random fields with exponential, 1-norm 
covariance, i.e. covariance function (2.3) with ||x|| = [|sc||i := Yl%= 



Xi 



We will comment on the 



general case at the end of the section. 

For a Gaussian random field g, the Karhunen-Loeve (KL) expansion is an expansion in terms 
of a countable set of independent, standard Gaussian random variables {£n}neN- It is given by 



g(u, x) = E [g(u, x)] + ^ \ff^b n {x)£, n (u) , 



n=l 

where {# n }neN are the eigenvalues and {6 n }neN are the corresponding normalised eigenfunctions 
of the covariance operator with kernel function 

C(x, y) := E [(g(co, x) - E[g(u, x)])(g(co, y) - E[g(u, y)}) 

For more details on the derivation, see e.g. [T3] . 

The log-normal coefficient field shall then be written as 



a(u>, x) = exp 



E [g(u>,x)\ + \fOnb n {x)£ n (u) 



n=l 



and the random fields resulting from truncated expansions with K E N terms shall be denoted by 

K 

g K {u,x) := E [g(u, x)] + ^ y0nb n (x)£ n (u)) and a K (u, x) := exp [ffjf(w, x)) . 



n=l 



Moreover, we denote by uk £ H\(D) the weak solution to 

-V • (a K (uj, x)Vu k (uj, x)) = f(u, x), 
uk(uj,x) = <j)j(u),x), 



for x G D, 
for x G r,- . 



(4.1) 



i.e. our model problem (2.1) with the coefficient a replaced by its .fT-term approximation. The 



finite element approximation of uk in Vh,d> is denoted by UKh- It nas been shown in E] that in 
the case of the 1-norm exponential covariance, Assumptions A1-A2 are satisfied also for clk, for 



any t < 1/2 (independent of K). Therefore the theory in the earlier sections applies also to (4.1). 

Since the convergence with respect to K is quite slow (see below), to get a good approximation to 
E[<2/t] we need to include a large number of terms on the finest grid, both in the case of the standard 
and the MLMC estimator. However, as mentioned at the beginning of this section, we are free in the 
MLMC estimator to choose different approximations of a(oj, x) on the coarser levels. In particular, 
we can choose to include fewer terms in the KL-expansion above. The eigenvalues {# n }neN are 
all non-negative with Yln>i ®n < +oo. If we order them in decreasing order of magnitude, the 
corresponding eigenfunctions {6 n }neN w ih be ordered in increasing order of oscillations over D. 
By truncating the KL-expansion after Kg < K terms, we are hence disregarding the contributions 
of the most oscillatory eigenfunctions, and a^ £ (o;, x) is a smoother approximation of a(oj,x) than 
cik(w,x) leading to FE problems that can be solved more accurately on the coarser levels. The 
key question is then, how we should choose Ki in terms of t (or equivalently hi). As an example 
of how to determine a suitable strategy, we make use of the following results from [5j |6] on the 
convergence of uk,k to u in the 1-norm exponential covariance case. See below for comments on 
strategies for other fields. 
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Proposition 4.1. Let a be a log-normal random field with 1-norm exponential covariance, and 
suppose that Assumption A3 is satisfied for some p* G (0, oo] and for t > 1/2. Then, 



\u-UK,h\\LP(n,H£(D)) < Cajrfj {h s + K~ 



and 



■UK,h\\LP(n,L' 2 (D)) < CaJ^j {h 2s + K s ) 



for all p < p* and < s < 1/2. The hidden constant is independent of h and K . 

As in the previous sections this result can again be extended in a straightforward way to 
functionals. 



Corollary 4.2. Let the assumptions of Proposition \4^1\ be satisfied and suppose that for the trun- 
cated problem (4.1) and for the functional M w (-) we have Assumption Fl satisfied with t* > \ 
and G (0, oo], i.e. is Frechet differ entiable and D v M u (uK,UK,h) is bounded in i? 1_< *(D). 
Assume further that there exists C' F G L q *(Q) such that D v M u (u,uk) < C^IMI/f^D)* for all 
v G Hq(D). Then 

\\MUu) - M u (u K ,h)\\LP(n) < C a>M . jCF ,C F {h 2s + K' s ) , 

for any p < + and < s < 1/2. The hidden constant is again independent of h and K. 

Proof. First note that due to the triangle inequality, we have of course 

\M u {u K>h ) - M w (u)\ < \M u (uK,h) ~ M u {u K )\ + \M u (u K ) - M u (u)\ (4.2) 

As noted above, it follows from [5J §7] that Assumptions A1-A2 are satisfied for the truncated 
expansion of a log-normal random field with 1-norm exponential covariance. Since Assump- 



tion A3 is also assumed to hold, it follows as in Proposition 3.4 from Holder's inequality that the 
L p -norm of the first term in (4.2) is 0(h 2s ), for any p < + and < s < 1/2, with a 

constant that is independent of h and K. 

To bound the second term in (|4.2|), we can use (|3.5|) so that by assumption 

(4.3) 



|M w (m) - Mu{uk)\ = D u ^ UK M UJ (u, uk) < C' F \\u - u K \\m{D) ■ 



It follows from [6l Proposition 2.8] that ||u — ukWh 1 ^) — C a j^K s , for any < s < 1/2. 
Thus, Holder's inequality implies again that the L p -norm of the second term is 0(K~ S ), for any 

and < s < 1/2, with a constant that is independent of h and K. Note that 



p < 



± + ± 



-i 



in (4.3) we cannot exploit Galerkin orthogonality to get a doubling of the convergence rate with 
respect to K, since u and uk are solutions to two problems with different bilinear forms. □ 

As expected, these results suggest that to balance out the two error contributions, we should 
choose JQ as a power of hg. Note that a similar strategy was already suggested in the context 
of the related Brinkman problem in [19] . However, there, a certain decay rate for the FE error 
with respect to the number of KL-modes K was assumed. Here we make no such assumption and 

For the simple functional M{u) 



4.1 



instead use Proposition 

Kg > hj l . For other functionals, that satisfy Assumption Fl with t* > t, Corollary 



| u | #i(£>), Proposition 



4.1 



4.2 



implies 
implies 



that we should choose Ki > h e . If we do this, we have the following results for the multilevel 



Monte Carlo convergence rates in Theorem 2.5 



and Kf 



> hf, for all £ 



Proposition 4.3. Provided Assumption Fl is satisfied with > 

0, . . . , L, then the convergence rate of the multilevel Monte Carlo method in { 2.2 does not deteriorate 

when approximating the functio nal M w {uh e ) by Qh e ■= M u (v,K t ,hi) on eac ^ ^ eve ^ ^ n particular, 

let the assumptions of Corollary J^.2 be satisfied with p* > 2 and > p 2p ^ 2 . Then the Assumptions 

M1-M2 in Theorem\2.5\hold for any a < 1 and f3 < 2. If Assumption Fl is satisfied only for some 

(1+2U) 



U < 1/2, then K e >h 



is a sufficient condition. 
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Proof. The proof is analogous to that of Proposition 3.4 using the result in Corollary 4.2 The final 



statement follows from Remark 13.51 □ 



As before, in the presence of quadrature error (cf. Remark 3.5), we will not be able to get 
0(h 2s ) convergence for the first term in (4.2) for the approximate finite element solution UK,h- 



Due to the loss of Galerkin orthogonality for the primal problem, it is in general only possible to 
prove \Mu(u) — M u {uK,h)\ = O {h s + K~ s ) . Thus with the quadrature error taken into account 
the optimal choice is Ki > hj 1 for all functionals and we will always use that in our numerical 
tests in the next section. Higher rates of convergence can again be recovered, if the random field 
a(u>, x) is more regular. 

Let us finish this section with some comments on truncated expansions ax = exp(g , ^f) of log- 
normal fields with other covariance functions. The convergence rate of \M ul (u) — M u (uk)\ depends 
in general on the rate of decay of the KL-eigenvalues n and on the rate of growth of HV&nHoo. 
If we assume that \M w (u K ) - Mw(u K ,h)\ = 0(h s ) and \M u (u) - M u {uk)\ = 0(K~ a ), for some 
< s < 1 and < a < oo, then the number of KL-terms in a multilevel Monte Carlo method 

s_ 

on each level should satisfy Kt > he a ■ For smoother fields (e.g. with covariance functions from 
the Matern class), ^ will usually be significantly smaller than 1, and thus the number of KL-terms 
only needs to grow very slowly from level to level. 

However, the only other rigorous results regarding convergence rates for truncated expansions 
ax = exp(g^) of log-normal fields - except those for the 1-norm exponential covariance above - 
are for the case of a Gaussian covariance function 



E 



(g(u,x)-E\g{u,x)])(g(u,y)-E\g(u},y)]) = a 2 exp(-\\x - y\\ 2 / X 2 ) (4.4) 



for g with a 2 and A as in (2.3). In this case, provided the mean is sufficiently smooth, we in fact 
have £»(&;,•) € C°°(D) and 

\M u (u) - M u (u K ,h)\ < C a , Mj (h 2 + exp ( - a K x l d 

for some c\ > (cf. |6j), where d is again the spatial dimension. Thus, Kg only needs to be 
increased logarithmically with hj d in this case. 

However, all these results are asymptotic results, as ht — > 0, and thus they only guarantee 
that level-dependent truncations do not deteriorate the performance of the multilevel Monte Carlo 
method asymptotically as the tolerance e — > 0. The real benefit of using level-dependent truncations 
is in absolute terms for a fixed tolerance e, since the smoother fields potentially allow the use of 
coarser levels and thus significant gains in the absolute cost of the algorithm. In the next section, 
we see that this is in fact the case and we show the gains that are possible, especially for covariance 
functions with short correlation length A. 

4.1.1 Numerics 

To be able to deal with very short correlation lengths in a reasonable time, we start with the ID 



equivalent of model problem (3.12), on D = (0, 1). We take a to be a log-normal random field 



with 1-norm exponential covariance function (2.3), with correlation length A = 0.01 and variance 
o~ 2 = 1. We will present results for two different modelling regimes for a: one in which the number 
of modes included is fixed at K, independent of ht, and one in which the number of modes Ki is 
chosen dependent on the mesh size Ji£. In order to make the two regimes comparable, we choose 
Kg such that both regimes include the same number of modes (i.e. K$ = K) on the finest grid 
considered. 
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Figure 4: Left: Plot of E [M (1 \u h )] and |E [M^\u h ») - \M^(u h )] \, for model problem fl3l2 ) 
with d = 1, A = 0.01, a 2 = 1, K t = hj 1 , h* = 1/4096, K* = 4096 and x* = 2049/4096. Right: 
Corresponding variances V [M««)] and V [M^(u h ) - M^{u 2h )] . 
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Figure 5: Plot of cost versus 1/h for a fixed tolerance of the sampling error of 5 = 10 3 , for model 
problem ( |3.12| ) with d = 1, A = 0.01, a 2 = 1 and iQ = /i^ 1 . The quantity of interest is MW(u) 
with x* = 2049/4096. 



Figures [4| and p] show results for the point evaluation of the pressure at x = 2049/4096, i.e. 
MW(u) from pjjwith x* = 2049/4096. Similar gains can be obtained for other quantities of 
interest. 

Let us start with Figure |4j The number of modes included in the regime with a fixed number 
of modes is K = 2048, and for the level-dependent regime we choose Kg_ = hj l . The reference 
value Qh* is computed with h* = 1/4096 and K* = 4096. In the left plot, we see that even though 
dropping modes leads to a larger bias on the coarser grids, the fact that we chose Kg = K on the 
finest grid ensures that the bias is the same on this grid. It is the plot on the right that gives us 
information about the coarsest level we should include in the multilevel estimator. If we are in 
a situation where V[Qh e — > ^[Qh e ], then there is no benefit including level £ — 1 in the 

multilevel estimator, since it would only increase the cost of the estimator. Looking at the right 
plot in Figure [4j it is then clear that for the regime with a fixed number of modes on each level, 
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we should not include any levels coarser than ho = 1/64 (~ A) in the estimator, as was already 
observed in [S|. With the level-dependent regime, however, it is viable to include levels as coarse 
as ho = 1/2. This leads to significant reductions in computational cost, as is shown in Figure [5] 

In Figure [5j we fix the required tolerance for the sampling error (i.e. the standard deviation of 
the estimator) at 5 = 10 -3 , and look at how the cost of the different estimators grows as we decrease 
the mesh size h of the finest grid. The computational cost of the multilevel estimator is calculated 
as NqIiq 1 + Yle=i Ni{h^ 1 + hj\) work units, since we know that 7 = 1 in (M3) for d = 1. To 
make the estimators comparable, on each grid hi, the standard Monte Carlo estimator is computed 
with Ki modes, the "MLMC keep" estimator is computed with K = Ki modes on all levels, and 
the "MLMC drop" estimator is computed with a varying number Ki = hj 1 modes on the levels. 
We clearly see the benefit of using the level-dependent multilevel estimator. For example, on the 
grid of size h = 1/2048, the cheapest multilevel estimator with a fixed number of modes is the 4 
level estimator, which has a cost of 8.6 x 10 5 work units. The cheapest level-dependent multilevel 
estimator, on the other hand, is the 7 level estimator, whose computational cost is only 1.8 x 10 5 
units. For comparison, the cost of the standard estimator on this grid is 2.8 x 10 6 units. 

An important point we would like to make here, is that not only do the level-dependent 
estimators have a smaller absolute cost than the estimators with a fixed number of modes, they 
are also a lot more robust with respect to the coarse grids included. On the h = 1/2048 grid, the 
11 level estimator (i.e. ho = 1/2) with fixed K, costs 1.1 x 10 7 units, which is 4 times the cost of 
the standard MC estimator. The 11 level estimator with level-dependent Ki costs 2.4 x 10 5 units, 
which is only marginally more than the best level-dependent estimator (the 7 level estimator). 

For practical purposes, the real advantage of the level-dependent approach is evident on coarser 
grids. We see in Figure[5]that on grids coarser than h = 1/256, all multilevel estimators with a fixed 
number of modes are more expensive than the standard MC estimator. With the level-dependent 
multilevel estimators on the other hand, we can make use of (and benefit from) multilevel estimators 
on grids as coarse as h = 1/64. This is very important, especially in the limit as the correlation 
length A —7- 0, as eventually all computationally feasible grids will be "coarse" with respect to A. 
With the level-dependent estimators, we can benefit from the multilevel approach even for very 
small values of A. 

Let us now move on to a model problem in 2D. We will stud y the flow cell model problem (3.13 ) 
on D = (0, l) 2 , and take the outflow functional M^\u) from £3.4 as our quantity of interest. As 
in £3.5, we choose the weight function ip = x\. We choose a to be a log-normal random field with 
1-norm exponential covariance function (2.3), with A = 0.1 and a 2 = 1. 

Figure [6] is similar to Figure |4| The number of modes included in the regime with a fixed 
number of modes is K = 512, and in the level-dependent regime we include Kg = 4/i^T 1 modes on 
each level. The reference value Qh* is computed with h* = 1/256 and K* = 1024. As before, the 
coarsest level which should be included in the multilevel estimator can be estimated from the right 
plot in Figure [6j For the regime with a fixed number of modes, it is clear that no grids coarser 
than ho = 1/8 should be included in the multilevel estimator. For the level-dependent regime, it 
is viable to include grids as coarse as ho = 1/2. 

In Figure [7j we see the gains in computational cost that are possible with the level-dependent 
estimators. The results shown are calculated with a Matlab implementation on a 3GHz Intel Core 
2 Duo E8400 processor with 3.2GByte of RAM, using the sparse direct solver provided in Matlab 
through the standard backslash operation to solve the linear systems for each sample. Since we do 
not know the value of 7 in (M3) theoretically, we quantify the cost of the estimators by the CPU- 
time. On the finest grid h = 1/256, we clearly see a benefit from the level-dependent estimators. 
The cheapest multilevel estimator with a fixed number of modes is the 5 level estimator, with takes 
13.5 minutes. The cheapest level-dependent estimator is the 7 level estimator, which takes only 
2.5 minutes. For comparison, the standard MC estimator takes more than 7.5 hours. 
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Figure 6: Left: Plot of \E[M^\u h *)} | and \E[M^ (u h *) - M^'(u h )]\, for model pl3j ) problem 
with d = 2, A = 0.1, cr 2 = 1, tp = xi, K e 

variances Y[M^\u h )] and Y[M^\u h ) - M^' \u 2h )] ■ 
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Figure 7: Plot of CPU-time versus 1/h for a fixed tolerance of the sampling error of 5 = 10 , for 
model problem (3.13) with d = 2, A = 0.1, a 2 = 1 and Kg = ^hj 1 . The quantity of interest is 



M^\u), with ip 
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5 Domains with corners and discontinuous coefficients 



We now come to the last and most technical part of the paper. The first aim is to prove Theorem 2.2 
i.e. to extend the regularity results in [6] to piecewise C 2 domains. In this situation, the solution 
u can have singularities near the non-smooth parts of the boundary V, i.e. near corners in 2D and 
near corners and edges in 3D. These singularities can reduce the overall regularity of u, and hence 



need to be analysed. However, we will see in ^5.1 that under Assumptions A1-A2, this question can 



be reduced to analysing the singularities of the Laplace operator on D. We will follow [231 §5-2], 
and as in [6] we will again establish the result first "pointwise" almost surely in oj £ £1. The key 
technicality will again be to track how the constants in all the necessary estimates, in particular 
in the semi-Fredholm property of the underlying random differential operator, depend on uj. 



In £ 5.2 , we then extend the results also to the practically very important case where the 
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coefficient a(oo, x) is discontinuous. This is of interest for example in subsurface flow modelling, 
where one often deals with layered media. If a(oo, •) is piecewise Holder continuous, then the 



regularity results from Theorem |2.2| and {5.1 will still hold on each of the subdomains, but no 



longer globally on the entire domain. The aim of {5.2 is to formulate an assumption similar to 
Assumption A2, under which we can conclude on the global regularity of u also in the case of 
discontinuous coefficients. 

5.1 Regularity of random differential operators in domains with corners 

Let us recall that D was assumed to be a bounded, Lipschitz polygonal/polyhedral domain in W 1 , 
d = 2,3, and that Xa(D) G (0, 1] is the largest number such that for all < s < Aa(-D), s 7^ |, 
the Laplace operator with homogeneous Dirichlet boundary conditions is surjective as an operator 



from H 1+s (D)r\Hl(D) to H s - l {D) (cf. Definition^. As in [23 §5.2], for simplicity we actually 
consider D to be a piecewise C 2 domain and restrict ourselves for the most part to R 2 . However, 
we will also comment on the case d = 3 in Remark |5.4[ c) below. We again write the boundary T as 
r = U^jL-^Fj, where now in 2D each Tj is an open arc of curve of class C 2 , and Tj meets Tj+i at Sj 
(where we identify r m+ i and Ti). We consider only domains with boundaries that are rectilinear 
near the corners, which of course includes Lipschitz polygonal/polyhedral domains. This means 
that at each corner Sj, we can find a polygonal domain Wj C D such that the boundary dWj 
coincides with T near Sj. 

Applying the Lax-Milgram Theorem, a unique variational solution u{uj, •) G Hq(D) to our 



model problem (2.1) in the curvilinear polygon D exists, for almost all u G (i.e. for all u G £1 
with a m i n (o;) > and a max (oj) < 00). Using Assumptions A1-A3, we can conclude as in [6] that 
u G L P (Q, Hq(D)), for all p < p*. The fact that D is no longer C 2 is of no relevance here. To prove 
more spatial regularity on u, we will now follow the proof in §5.2 of |23j. 

For a given oj G CI, with a m [ n (uj) > and a max (w) < 00, we define the differential operator 

A^u = -V • (a(u, -)Vu)). 

The following key result, which is based on \28\ Theorem 5.26], is proved via a homotopy method 
in the proof of \23\ Lemma 5.2.5], for s = 1. The proof for s < 1 is analogous. 

Lemma 5.1. Let m = 1 and cj G £1. If < s < \a(D) o,nd if there exists C scmi (o;) > such that 
\\v\\ H i+'(D) < ^semiMIIAHIffs-ip), for alive H 1+s (D)nH^(D), (5.1) 
then A u is surjective from H 1+S (D) n H&(D) to H S - 1 (D). 

Thus, if we can establish ( |5.1| ), which essentially means that A w is semi-Fredholm as an operator 
from H 1+S (D) n Hq(D) to H S ~ 1 (D), for some s < Aa(-D), then we can also conclude on the 
regularity of solutions of the stochastic variational problem ( |2.1[ ) . The following lemma essentially 
follows |23l Lemma 5.2.3]. However, in the case of a random coefficient, we crucially need to make 



sure that the constant C semi (u;) in (5.1) has sufficiently many moments as a random field on To 



ensure this we need to carefully track the dependence on a in the bounds in |23l Lemma 5.2.5]. 



Lemma 5.2. Let m G N and let Assumptions Al -A2 hold for some < t < 1. Then (5.1) holds 
for all < s < t and s < \a(D)> s 7^ \> with 

C scmi (u,) := C{D) . (5.2) 

OminM 4 



In the case t = \a{D) = 1> (5.1) also holds for s = 1, i.e. for the H (D)-norm. 
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Proof. We first consider the case where m = 1 and t = Aa(-D) = 1- Note that the case m = 1 is all 
that is needed to prove Lemma 5.1 We prove the more general case m G N so that we can apply 
the bound (5.1) to any polygonal domain in the proof of Theorem 2.2 For ease of notation, we 
suppress the dependence on ui in the coefficient, and denote simply by A and a(co,x) by a(x). 

We will prove (5.1) by combining the regularity results of A in C 2 domains, with regularity 
results of the Laplace operator —A on polygonal domains. Since we assume that T is rectilinear 
near Si, we can find a polygonal domain W such that W C D and dW coincides with T near Si. 
Let v G H 2 (D) HHq(D) and let r\ be a smooth cut-off function with support in W, such that rj = 1 
near Si and then consider rjv and (1 — rj)v separately. We start with rjv. 

Let w G H 2 (W) Hi?o {W). Since Xa(D) = 1, we have for any polygonal domain W the estimate 

\\ w \\m(w) ^ \\ Aw \\l2(w) , ( 5 - 3 ) 
where the hidden constant depends only on W (cf. |23j). Hence, 

a ( s i) \\ w \\h 2 (w) ^ II^HU 2 (W) + \\ Aw ~ a ( s i) Aw\\ L 2( W) 
< \\Aw\\ L 2 (w) + |(a(-) - a(Si))Vw\ H i( W) . 

Now, using [U Lemma A. 2] (see also Theorem 6.2.25 in [25]) we get 

a(Si)\\w\\ H 2 {w) < \\Aw\\ L 2 {w) + \a\ cl{W) \w\ H i {D) + \\a - a(Si)\\ c0{W) \\Vw\\ H i^ w) . (5.4) 

Using integration by parts and the fact that ro = 0on dW, we have 

IW,< /«|Vw| ! d* = /"wV.(aVw)dx 
ivy ./V 

and so via the Cauchy-Schwarz and the Poincare inequalities 

\w\hHw) ^ Pw||z, 2 (w)- (5.5) 



Denote now by C the best constant such that (5.4) holds. Since a was assumed to be in C (W), 
we can choose W (and hence the support of rj) small enough so that 

C\\a - a(5i)|| c o (W) < - a(Si) (5.6) 



Then, substituting (5.5) and (5.6) into ( |5.4| ) and using ||Vu;||#i(w) < [| jy^r-p^j and a m ; n < a max 
we have 

0(50 \W\hHw) < 2C ( 1 + t^M ] \\Aw\\ L 2 {w) < Mcl{W) \\Aw\\ L 2 {w) . (5.7) 

\ O-min / ^min 

Since v G H 2 (D) n Hq(D) and contains the support of 77, we have r/t> G i? 2 (W) n flo(W) 
and so estimate ( |5.7[ ) applies to rjv. Thus 

II^IIh2(d) ^ - \ {w) \\ a (vv)\\l*(w)- 

a min 

Let us move on to (1 — rj)v. Let D' C D be a C 2 domain that coincides with D outside of the 
region where 77 = 1. This is always possible due to our assumptions on the geometry of D near Si. 
Then (1 — 77)7; G H 2 (D') n Hq(D'), and using [6, Proposition 3.1] we have 

11(1 - V)v\\hz(D) < 3 \\A ((1 - 77)7;) II £»(£)/). 
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Adding the last two estimates together and using the triangle inequality, we have 

\\v\\h H d) < (\\A(vv)\\mw) + — P((l - r})v)\\ L2m ) . (5.8) 

a min V "mm / 



It remains to bound the term in the bracket on the right hand side of (5.8) in terms of 11^4^11^2^-). 
Note that 

A(r]v) = rj(Av) + 2aVr/ • Vv + {Aq)v. 

Thus, applying the triangle inequality and using the fact that 77 was assumed to be smooth with 
< 7/ < 1, we get 

\\ A (vv)\\ L 2(W) ^ PHIl2(W) + a max \v\m(w) + llallci^lblli 2 ^)- ( 5 - 9 ) 

The hidden constant depends on ||V77[|£,c»mn and on || Ar)\\L°°(w)- Finally using Poincare's inequal- 
ity on all of D, as well as an elliptic estimate similar to (5.5) for v, i.e. \v\h 1 (D) < ll^ w llL 2 (D)/ a min ) 
leads to 

\\A(w)\\mw)< ° C1(B) \\Av\\ L 2 {D y 



Substituting this and the corresponding bound for ||A((1 — t]) v )\\l 2 (D') into (5.8), we finally get 

<2max|| Q ||^i/£^ 
\Mm(D) ^ 4 \\Av\\ L 2 {D) 

a min 

for all v G H 2 (D) n Hq(D). This completes the proof for the case m = 1 and t = A a = 1. 

The proof for t < 1 and/or Aa(-D) < 1 follows exactly the same lines. Instead of (5.3), we start 
with the estimate 

\\ w \\m+s(w) ^ \\^ w \\h^(w)^ ( 5 - 10 ) 

which holds for any < s < Xa(D), s ^ |, and the hidden constant depends again only on W 
(cf. [3] for example). Using |B, Lemma A.l] (see also Theorem 9.1.12 in |25|), one can derive the 
following equivalent of (5.4), for any s ^ \: 

a (Si) \\w\\h^(w) ^ WAMIhs-^w) + l°lc*(W)IH.ff 1 (WO + H a ~~ a ( S ^)\\c°(w)W^ w \\H s (w) ■ 

As before, the |u>|.H-i(vp) term can be bounded using integration by parts, Holder's inequality and 
the Poincare inequality: 

amin \w\m(W) - \\ W \\m-°(W)\\Aw\\ H s-i( W ) < II' u; IIh 1 (M / )II" 4u; II^ s - 1 (M / ) ~ \ w \m(W)\\Aw\\ H s-i( W y 

The remainder of the proof requires only minor modifications. 

The case m > 1 is treated by repeating the above procedure with a different cut-off function rjj 
at each corner Sj. Estimate (5.7) applies to rjjv, for all j = 1, . . . , m, and the regularity estimate 
for C 2 domains from [6] applies to (1 — Y^=i Vj) v - D 



Remark 5.3. Lemma 



5.2 



excludes the case s = ^. However, an inequality very similar to (5.1) 
can easily be proved also in this case. Since IMI/p+^D) < IMI-ff 1+ *(-D)j f° r an Y s < t, IMI.f/3/ 2 (.D) 
can also be bounded, as in ( pnj ), if the H X I 2 (D) -norm on the right hand side is replaced by the 
i/ 1 / 2+l5 (Z))-norm, for some 5 > 0. 



We are now ready to prove Theorem 2.2 for d = 2. For the case d = 3, see Remark |5.4|(c). 
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Proof of Theorem \2.£\ Let d = 2 and suppose u = u(u, •) is the unique solution of (2.1). Let us 
first consider the case = 0. In this case, the fact that u G H 1+S (D) n Hq(D) and the bound on 
||^||_ffi+sm) in (2.6) follow immediately from Lemmas 5.1 and 5.2, for any s < t and s < Xa(D), as 



well as for s = 1 if t = Xa(D) = 1, since / = Au. 

The case 7^ now follows from a simple trace theorem, see e.g. §1.4 in [21]. We will only 
show the proof for t = Xa(D) = 1 in detail. Due to Assumption A3 we can choose G H 2 (D) 
with \\<t)\\ H 2(D) ^ Ejli H0jllH3/2 (r .), and so / := /- A<j) G L 2 (D). Since no := u — G H^(D) we 
can apply the result we just proved for the case = to the problem ^4uo = /o to get 

IK[|jf2(£>) ^ C scmi (w) {\\Au \\ L 2 {D) + P0|| i2(D) ) 

< C scmi (w) (J|/|| L 2 (D) + ||a|| c i (S) ||0||if2(D)) , 

where in the last step we have again used [61 Lemma A. 2]. The claim of the Theorem for 0^0 
then follows by the triangle inequality. □ 

Remark 5.4. (a) The behaviour of the Laplace operator near corners is described in detail in 
|23|I24|. In particular, in the pure Dirichlet case for convex domains we always get Xa(D) = 1. 
For non-convex domains Xa(D) = min^! =1 ir/0j, where 6j is the angle at corner Sj. Hence, 
Xa(D) > 1/2 for any Lipschitz polygonal domain. 

(b) In a similar manner one can prove regularity of u also in the case of Neumann and mixed 
Dirichlet/Neumann boundary conditions provided the boundary conditions are compatible, 



like in our model problem (3.13). For example, in order to apply the same proof technique 
used here at a point where a Dirichlet and a homogeneous Neumann boundary meet, we can 
first reflect the problem and the solution across the Neumann boundary. Then we apply the 
above theory on the union of the original and the reflected domain. The regularity for the 
Laplacian is in general lower in the mixed Dirichlet /Neumann case than in the pure Dirichlet 
case. In particular, Xa(D) = murj^ ^gr in the mixed case in 2D and so full regularity (i.e. 
Xa(D) = 1) is only possible, if all angles are less than ir/2. For an arbitrary Lipschitz 
polygonal domain we can only guarantee Xa(D) > 1/4. 

(c) The 3D case is similar, but in addition to singularities at corners (for which the analysis is 
identical to the above) we also need to consider edge singularities. This is a bit more involved 
and we refer to [231 §8.2.1] for more details. However, provided D is convex, we obtain again 
Xa(D) = 1 always in the pure Dirichlet case. 

5.2 Discontinuous coefficients 

We now shift our attention from the domain D to the random coefficient o(w,x). In practice, one 
is often interested in models with discontinuous coefficients, e.g. modelling different rock strata 
in the subsurface. Such coefficients do not satisfy Assumption A2, and the regularity results from 



Theorem 2.2 can not be applied directly. However, this loss of regularity is confined to the interface 
between different strata and it is still possible to prove a limited amount of regularity even globally. 
Let us consider (2.1) on a Lipschitz polygonal domain D C M 2 that can be decomposed into 



disjoint Lipschitz polygonal subdomains D^, k = 1, . . . , K. Let PC l (D) C L OQ (D) denote the space 
of piecewise C* functions with respect to the partition {Dk}^ =l (up to the boundary of each region 
-D/c). We replace Assumption A2 by the following milder assumption on the coefficient function a: 

A2*. a G L p (tt, PC*(B)), for some < t < 1 and for all p G (0, oo). 
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Our regularity results for discontinuous coefficients rely on the following result from j2,3i I3T] . 
The proof of this result uses the fact that for 0<s<l/2,u>G H S (D{) if and only if the extension 
w of w by zero is in H s (M, d ). 

Lemma 5.5. Let v G H 1 ^) and s < 1/2, and suppose that v G H 1+s {D k ), for all k = 1, . . . , K. 
Then v G H 1+S (D) and 

K 

\\ v \\m+s(D) = \\ v \\m(D) + ^ M-ff 1+s (A0 • 

k=l 

Thus, we cannot expect more than j^ 3 / 2-<5 \D) regularity globally in the discontinuous case. 
However, as in the case of continuous fields, the regularity of the solution will also depend on 
the parameter t in Assumptions A2* and A3 (i.e. on the Holder/ Sobolev regularity of a and /, 
respectively), as well as on the behaviour of A u at any singular points. Since Lemma 5.5 restricts 
us to s < 1/2 and since Aa(-D) > 1/2 for any Lipschitz polygonal D in the case of a pure Dirichlet 
problem, we do not have to worry about corners. Instead we define the set of singular (or cross) 
points S x := {Sf : t = 1, . . . , L} to consist of all points S£ in D where three or more subdomains 
meet, as well as all those points Sf on dD where two or more subdomains meet. By the same 



arguments as in £5.1 the behaviour of A u at these singular points is again fully described by 
studying transmission problems for the Laplace operator, i.e. elliptic problems with piecewise 
constant coefficients, locally near each singular point (cf. [30"1 ®1 l3Tj ) . 

Definition 5.6. Denote by T(a\, . . . , ax) the operator corresponding to the transmission problem 
for the Laplace operator with (constant) material parameter a k on subdomain D^, k = 1, . . . ,K. 
Let < Xt(D) < 1 be such that T(a±, . . . , olk) is a surjective operator from H 1+S {D) n Hq(D) to 
H S ~ 1 (D), for any choice of a±, . . . ,ax and for s < \x(D), s ^ 1/2. In other words, Xt(D) is a 
bound on the order of the strongest singularity of T(a%, . . . , or-). 

Without any assumptions on the partition {D k }^ =1 or any bounds on the constants {ctk\k=i 
it is in general not possible to choose Xt(D) > 0. However, if no more than three regions meet at 
every interior singular point and no more than two at every boundary singular point, then we can 
choose Xt{D) < 1/4. If in addition each of the subregions is convex, then we can choose any 



Xt(D) < 1/2, which due to the restrictions in Lemma 5.5 is the maximum we can achieve anyway. 
See for example [301 13 EI] f° r details. 



The following is an analogue of Theorem 2.2 on the regularity of the solution u of (2.1) for 
piecewise C* coefficients. All the other results on the finite element convergence error discussed 
above follow of course again from this. 

Theorem 5.7. Let Dei 2 be a Lipschitz polygonal domain and let Xt(D) > 0. Suppose Assump- 
tions Al, A2* and A3 hold with t < 1. Then, the solution u of ((2lj| is in L p (n,H 1+s (D)), for 
any < s < t such that s < Xt(D) and for all p < p*. 

Proof. Let us first consider <fi = again. Then, the existence of a unique solution u(co, •) G i7 1 (D) 



of (2.1) follows again from the Lax-Milgram Theorem, for almost all ui G Q. Also note that 
restricted to the transmission operator T(a±, . . . , ax) = for all k = 1, K. Therefore, 

using Assumption A2* we can prove as in £ |5.1| via a homotopy method that u(ui,-) restricted to 
Dk is in H 1+s {Dk), for any s < t and s < Xt{D), for almost all uj G f2. The result then follows 
from Lemma |5.5| and an application of Holder's inequality. The case <j> ^ follows as in the proof 



to Theorem 12. 21 via a trace estimate. □ 

As an example of a random coefficient that satisfies Assumption A2* for any t < 1/2, we can 
consider a piecewise log-normal random field a = exp(g) such that g\o k '■= <7fe, for all k = 1, . . . , K , 
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where each g k is a Gaussian random field with mean fj, k (x) and exponential covariance function 



E 



{g k {u,x) - fj, k (x)])(g k (u;,y) - ^ k (y)) = a\ exp(-||x - y\\/X k ). 



In a similar manner, if we let each g k be a Gaussian field with mean [x k S C 1 (Z?) and Gaussian 
covariance function 



E 



(g k (u),x) - fi k {x)])(g k (u),y) - /j, k (y)) = a k exp(-\\x - y\\ /\%). 



we have Assumption A2* is satisfied for any t < 1. The mean n k {x), the variance cr 2 , and the 
correlation length X k can be vastly different from one subregion to another. 

5.2.1 Numerics 

A rock formation which is often encountered in applications is a channelised medium. To simulate 
this, we divide D into 3 horizontal layers, and model the permeabilities in the 3 layers by two dif- 
ferent log-normal distributions. The middle layer, which has a higher mean permeability, occupies 
the region {1/3 < X2 < 2/3}. The parameters in the top and bottom layer are taken to be fj,\ = 0, 
Ai = 0.3 and a\ = 1, and for the middle layer we take \X2 = 4, A2 = 0.1 and a 2 = I (assuming no 



correlation across layers). As a test problem we again choose the flow cell model problem (3.13) on 
the unit square D = (0, l) 2 . Samples from fields with exponential covariance are produced using 
the circulant embedding technique already used in §3.5| Fields with Gaussian covariance are ap- 
proximated by truncated Karhunen-Loeve expansions. The eigenpairs of the covariance operator 
are computed numerically using a spectral collocation method. 

Figures [8] and [9] s how results for fields with exponential and Gaussian covariance functions, 



respectively. Theorem 5.7 in both cases suggests a global spatial regularity of H 1 ^ 2 ^ S (D), for any 
5 > 0. For fields with exponential covariance function, this is the same global regularity as in 
the case of continuous coefficients (satisfying A2), and convergence rates of the finite element error 
should not be affected by the discontinuities. For fields with Gaussian covariance function, however, 
continuous coefficients give a global regularity of H 2 (D), and so the discontinuities should lead to 
lower convergence rates. 

The numerical results confirm this observation. For comparison, we have in Figures H| and 
[9] added the graphs for the case where there is no "channel", i.e. the permeability field is one 
continuous log-normal field with fj, = [i\ = 0, A = Ai = 0.3 and a 2 = a\ = 1. As expected, 
in Figure [8j we observe 0(h 1 / 2 ) convergence of the if 1 (D)-seminorm of the error, and linear 
convergence of the L 2 (D)-norm of the error, for both permeability fields. In Figure [9J however, we 
indeed observe the slower convergence rates for the layered medium. Whereas we observe 0{h 1 / 2 ) 
convergence of the i? 1 (D)-seminorm, and linear convergence of the L 2 (I?)-norm for the layered 
medium, we have linear convergence of the H 1 (D)-seminorm, and quadratic convergence of the 
L 2 (Z})-norm for the continuous permeability field. Since the slower convergence rates are caused 
by singulartities at the interfaces, one could of course use local mesh refinement near the interfaces 
in order to recover the faster convergence rates also for the layered medium. 



6 Conclusions and Further Work 

Multilevel Monte Carlo methods have the potential to significantly outperform standard Monte 
Carlo simulations in a variety of contexts. In this paper, we considered the application of multilevel 
Monte Carlo methods to elliptic PDEs with random coefficients, in the practically relevant and 
technically more demanding case of log-normal random coefficients with short correlation lengths 
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Figure 8: Left: Plot olE[\u h 
exponential covariance, with \i = fi\ 
h* = 1/256. Right: Plot of E [\\uh* 
line is —1/2 (resp. —1). 



u h\H 1 (D)] versus 1/h for model problem (3.13) with 2-norm 
= 0, /x 2 = 4, A = Ai = 0.3, A 2 = 0.1, a 2 = of = of = 1 and 
u fe[|l, 2 (Z))] • The gradient of the dash-dotted (resp. dotted) 




= 4, A = Ai 



Figure 9: Left: Plot of E[\u h * 
covariance, with fj, = m = 0, fj,% 
K* = 170. Right: Plot of E [\\uh* - u h \\ L 2 {D) 
dashed) line is —1/2 (resp. —1 and— 2). 



versus 1/h for model problem (3.13) with Gaussian 
= 0.3, A 2 = 0.1, a 2 = o\ = a\ = 1, h* = 1/256 and 
1 . The gradient of the dash-dotted (resp. dotted and 



where realisations of the diffusion coefficient have limited regularity and are not uniformly bounded 
or elliptic. We extended the theory from [6] to cover more difficult model problems, including corner 
domains and discontinuous means, and we offered one possible remedy for the problem of correlation 
length dependent coarse mesh size restrictions in the standard multilevel estimator. This was done 
by using level-dependent truncations of the Karhunen-Loeve expansion of the coefficient, resulting 
in smoother approximations of the coefficient on the coarser levels. 

An issue we plan to investigate further in the future, is how to achieve smoother approximations 
of the random coefficient on the coarser grids also using other sampling techniques, such as in the 
circulant embedding method used in ^3j Another area of future research is the adaptive choice of 
spatial grids in the multilevel estimator. 
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